Amendments to the Claims 

Kindly amend claims 12, 13 & 20, and cancel claims 1-11 & 21-30 (without prejudice), 
as set forth below. All pending claims are reproduced below, with changes in the amended 
claims shown by underlining (for added matter) and strikethrough/double brackets (for deleted 
matter). 

1-11. Canceled. 

12. (Currently Amended) A method of processing comprising: 

providing, by a dedicated collective offload engine coupled to a switch fabric in a 
distributed , parallel computing system, ^environment,!] collective processing of data 
from at least some processing nodes of multiple processing nodes of the distributedi 
parallel computing systgm_[ [environment]] , the dedicated collective offload engine being 
a hardware device coupled to the switch fabric, the hardware device being a specialized 
device dedicated to providing collective processing in hardware of data from the at least 
some processing nodes, the collective processing implementing a collective operation on 
the data from the at least some processing nodes ; 

producing, by the dedicated collective offload engine, a result based on said 
collective processing; and 

forwarding said result to at least one processing node of the multiple processing 

nodes. 

1 3 . (Currently Amended) The method of claim 1 2, wherein the [[dedicated collective 
offload engine is implemented as a hardware device coupled to the switch fabric]] collective 
operation is a Message Passing Interface (MPD collective operation . 
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14. (Original) The method of claim 12, further comprising: 

receiving and storing, at a payload memory, the data from the at least some 
processing nodes of the multiple processing nodes, wherein said payload memory is a 
component of the dedicated collective offload engine; and 

retrieving and performing, at an arithmetic logic unit (ALU), the collective 
processing of data stored in the payload memory, wherein said ALU is a component of 
the dedicated collective offload engine and is coupled to the payload memory. 

15. (Original) The method of claim 14, further comprising: 

controlling the collective processing of the data from the at least some processing 
nodes of the multiple processing nodes, wherein said controlling is performed by a 
dispatcher of the dedicated collective offload engine coupled to the ALU, and in 
communication with the at least some processing nodes of the multiple processing nodes 
via the switch fabric; and 

controlling, by the dispatcher, the sharing of the result with the at least one 
processing node of the multiple processing nodes. 

16. (Original) The method of claim 15, further comprising: 

storing, in at least one task table coupled to the dispatcher, task identification 
information related to the at least some processing nodes of the multiple processing 
nodes, wherein said at least one task table is a component of the dedicated collective 
offload engine; and 

storing, in at least one synchronization group table coupled to the dispatcher, 
identification information related to one or more groups of the at least some processing 
nodes of the multiple processing nodes, wherein said at least one synchronization group 
table is a component of the dedicated collective offload engine. 
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17. (Original) The method of claim 15, further comprising: 

communicating, via an adapter, across the switch fabric using a link protocol, 
wherein said adapter is coupled to the switch fabric and is a component of the dedicated 
collective offload engine; and 

facilitating, by interface logic, communication between said adapter and said 
payload memory and between said adapter and said dispatcher, wherein said interface 
logic is a component of the dedicated collective offload engine. 

18. (Original) The method of claim 12, further comprising: 

communicating among a plurality of dedicated collective offload engines via the 
switch fabric, wherein said communicating facilitates the collective processing of data 
from the at least some processing nodes of the multiple processing nodes and the 
producing of the result based thereon. 

19. (Original) The method of claim 12, further comprising: 

communicating among a plurality of dedicated collective offload engines via a 
channel disposed therebetween, said channel being independent of the switch fabric, 
wherein said communicating facilitates the collective processing of data from the at least 
some processing nodes of the multiple processing nodes and the producing of the result 
based thereon. 

20. (Currently Amended) The method of claim 12, wherein said providing collective 
processing includes executing the_[[at least one]] collective operation for the at least some 
processing nodes of the multiple processing nodes without using a software tree. 

21-30. Canceled. 

***** 
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